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PREDICTION OF POLYTOMOUS EVENTS: 

MODEL DESCRIPTION, ALGORITHM DEVELOPMENT 
AND METHODOLOGICAL ASPECTS, WITH AN APPLICATION 

I. O’MUIRCHEARTAIGH 
D. P. GAVER 

1 . Introduction 

The prediction of dichotomous events in meteorology (fog/no fog, 
precipitation/no precipitation) has been widely studied. Such predictions are 
also of interest in reliability and survival analysis, and in manpower 
planning. The analysis generally involves logistic regression or 
(equivalently) linear discriminant functions. Most standard statistical 
packages (e.g. BMDP, SAS) provide the facility for performing this analysis in 
some form. Also, the book by Cox (1970) can be consulted. 

A natural extension of this problem (and one which has many potential 
applications in meteorology and elsewhere) is the situation in which the 
predictand is polytomous, i.e. has multiple categories. For example, it might 
be desired to predict visibility (good/marginal /bad) or precipitation 
(none/rain/snow). Two (methodologically) distinct cases can be envisaged, 
viz. 

a. when the predictand involves ordered categories 

b. when the categories are unordered. 

Typically, the former case is the more common in meteorological applications, 
and this is the problem addressed in this report. 

The particular application analyzed here involves 583 records of time to 
formation of tropical storms, and associated values of various meteorological 
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variables. The time to storm formation is polytomized and recorded as 



1*. storm formed within 24 hours 
2*- storm formed between 24 and 48 hours 
3-’ storm formed between 48 and 72 hours 
4: storm did not form 

The five meteorological variables recorded were: 



Xj : unconditional probability of storm formation - a measure of the 

likelihood of storm formation in the particular disturbance 
location at the given time of year. 

Xg : large scale vorticity (computed over 5 Latitude grid). 

X^: divergence (computed over 5 Latitude grid). 

X^: small scale vortici ty (computed over 25 Latitude grid) 

Xj-: local generation of vorticity (product of X^ and X^) . 

Our objective was to determine how much predictive information is provided by 
the meteorological variables to facilitate prediction of imminence of tropical 
storms. Essentially the problem involves regression models where the 
dependent variable is ordinal. Much attention has been devoted to this 
general problem in the statistical literature of recent years (McCullagh 
[1980], McCullagh and Nelder [1983], Green [1984], Anderson [1984]). The 
central concept is that of the generalized linear model (McCullagh and Nelder 
[1983]). 



In section 2 we describe briefly the concept of generalized models, and 
in more detail, the particular model (an extension of [dichotomous] logistic 
regression) utilized for our data. In section 3 we summarize the results of 
an ad hoc application of the model to our data, present the relevant parameter 
estimates, and evaluate the predictive performance of our model. A general 
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discussion of our results is presented in section 4, together with suggestions 
for future work. 



2. The Model . 

2.1 General formulation. 

Following Green [1984] and McCullagh and Nelder [1983], we consider a log 
likelihood L, a function of an n-vector, rj , of predictors. We postulate, in 
our model, that the predictor tj is functionally dependent on the pvector P of 
parameters of interest. For our particular application, tj, and p are 
specified in Section 2.2. The maximum likelihood estimation of P involves 
essentially solution of the equations 



6L 

6P ~ 



T 

Du 



= 0 



(i) 



where u 



. 5L 
nxl 6tj 



and D 



nxp 



_ £n 

~ 6P 



using the notation of Green [1984]. Again following Green, the standard 
Newton-Raphson method for the iterative solution of (1) involves evaluating u, 
D and the second derivatives of L for an initial value of p, and then solving 
the linear equations 



(-^y) (P* ~ P) = D T u (2) 

SPP 



for an updated estimate p . 



Green shows that this is approximately equivalent 
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to the solution of 



(D T AD)(P*-P) = D T u (3) 

T 

where A = E(^- (f^ 1 ) )• 
nxn v <5tj y 6rj J ' 

given an initial estimate of P (about which we have some further comments in 
Section 2.3). Equation (3) can be solved directly for P , or. equivalently, 
if a weighted least squares program is available, P results from regression 
A + D P onto the columns of D using weight matrix A. 

2.2 Specific Formulation. 

In our application the data are in the form of N multinomial samples on 
the same set of k(=4) response categories (e.g. categories 1. 2. 3 and 4 
indicating storm imminence as described in Section 1). The data may be 
arranged as a two-way table of counts y , i=l...N; j=l, ...4. The log- 
likelihood L is then given by 



L = 2 2 y log p 
i j J J 



where 



p^are the cell probabilities. 



4 

and 2 

j=l 



ij 



= 1 



(4) 



In the case where the categories l,2,...,k are ordered, McCullagh and 
Nelder [1983] and Green [1984] both suggest the model 
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(5) 



r) 



ij 



j 

2 

r=l 



ir 



0. Ix. 
i 




i = l , 2 N; j=l .2 k-1 . 



where p. . represent for fixed i, the cumulative cell probabilities, the matrix 
^ J 

(x^) represents covariate information and ^ is a given distribution function. 
Motivation for their model is provided by considering the response variable as 
an arbitrary grouping of an unobservable underlying variable on a continuous 
scale with "cutpoints" 0^ , ***®k-l' some applications the 0^’s will be 

unknown and will need to be estimated; in others, such as the present one, 
they will be known, because the categorical variable (storm imminence coded 
1,2, 3, 4) really is an arbitrary grouping of an underlying continuous variable 
based on known cutpoints (in this case that variable is time to storm 
formation with cut-points 0^=24, 0£=48 0^=72). For our application, we 

chose ^ to be the logistic distribution function, viz. , 



^(x) = 



1 _ 

1+e 



-x 



( 6 ) 



This is the most widely used model in applications and has the advantage that 
a simple transformation can be used to (a) graphically check the suitability 
of the model and (b) provide initial values for the iterative estimation 
process . 
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Our model therefore is 



'ij 



1 + exp - 



M 

0-2xf 3 

J m _i im m 
m= i 

t . 
l 



or, in terms of previous notation 



(7) 



V = v(P) 



( 8 ) 



where 0 = , . . . /3^ , . t^) . Unless we impose constraints, ident i f iabi 1 i ty 

problems can arise. One common expedient (based on the concept of an 
underlying continuous variable used for the classification) is to allow an 
intercept and scale to write 



n ij = 



H 

t W in. P J 
M ^ ) 



i=l....N; j=l 



,k-l . 



(9) 



where 0’s are known. 

A reformulation, more convenient for actual computation 



n u = W 0 * 



P . 9 J 



M 

2 x 
m=l 



irn^m+2^ 



( 10 ) 



2.3 Specific Methodology 

We now apply the general methodology of Section 2.1 to the specific model 



6 



described in equation (9). This involves two steps 

(a) deriving explicit expressions for A, D and u described in that 
section, and 

(b) Finding a suitable starting value for the iterative reweighted 
least squares (IRLS) procedure. 

We describe first the expressions for D, A and u for the special case of 
M=1 covariates with k=4 categories. The extension to other cases is 
straightforward. If we let 



%xl ■ ^ir ^12' V 13' ^21 • V 22 ’W 



di7 

where n = Nx3, then D = ^ is a matrix given by 



D 



nx3 



/ F^.x^ 

F ( 0 2 ,x 1 ) 

F(6 3 .x 1 ) 

F(e r x 2 ) 



e 3 F ( 0 3 ,X l) 

0 1 F ( 0 1,x 2 ) 



X 1 F ( 0 1 ' x i > \ 

x 1 F(0 2 ,x 1 ) 
x 1 F(6 3 .x 1 ) 

x 2 F( 0 r x 2 ) 



F ^ 0 3 ,X N^ 0 3 F ^ 0 3 ,X N^ x N F ^ 8 3’ X N^ 



( 11 ) 



where 



Ftej.xj) = 



exp { 0 ^ P 2 0 i + ^ 3 X i^ 

{ 1 +exp [ p l +P 2 d t + 0 3 x j ] } 2 



Similarly, u , = 3 — is given by 
J nxl ot ) 
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u = 




and A 



nxn 



A 

F ( =t) is given by 

d W 



-V 



12 



T7 11^ T, 12 v lO 



t?i 2 ^n) 

-(^la-nn) 



V 12 ^ll 



A = 



0 



(T, 12 _T, 11 )(T7 13 _T, 12 ) 

1 



^ n 12 
-( 1_ ni 2 ) 



^13 ^12 ^13 T ? 12^ 1 71 13 ^ 



(12) 



(13) 



0 



~^22 

T, 2l^ T, 22~ T) 21^ 



etc . 



0 0 
with tridiagonal 3x3 matrices similar to the one given above along the main 



diagonal, and zeroes elsewhere. 
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Each of A, D and u depends on the unknown parameters p. Given an initial 
value for P we cam evaluate A, D and u and commence the iterative process. 

The initial values can be obtained by noting that if we apply a logit 
transformation to y in equation (10) to give 



£n 




Po + PjGj + j3 2 x. 



(14) 



then these logits are linear in the parameters g. Initial estimates of (3 can 
be obtained by constructing empirical logits. 




= In { 



j i 

(2 y ir ) + | 

r=2 1F * 



n - (2 
r=l 



y ir + b 



-) 



(15) 



and performing an unweighted least squares analysis for the model of equation 
(14) using these empirical logits as the dependent variable. 

Two points should be noted in relation to the estimation of initial 
values for /3. Firstly, the IRLS procedure is quite sensitive to the choice of 
initial value, (see Green (1984)), and a poor choice can lead to 
non-convergence of the algorithm. Secondly, to obtain initial values by this 
procedure, it may be necessary to group the observations into categories based 
on values of the independent var iables(s) . If the data are not so grouped, 
(and our data consists essentially of multinomial samples of size 1). then all 
y. . will be either 0 or 1 in our case this will lead to all the empirical 
logits having value either In 3 or -In 3. 
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Although, it may be necessary to group the data to obtain starting values 
for 0, the maximum likelihood estimation of P may be carried out for either 
the grouped or the ungrouped data. 



2.4 Computational procedure. 

Since a stepwise program was not available, the model given by equation (10) 
was estimated separately for each covariate. Using the deviance (the 
likelihood ratio statistic against the saturated model) as a measure of 
goodness of fit. the best single explanatory variable was X^. Having thus 
determined the optimal single variable for inclusion in the model, we then 
proceeded to establish which, if any, variable should next be included in the 
model. Due to the non-availability of a package for performing this analysis, 
the computation involved was cumbersome; the inclusion of an additional 
variable necessitated the re-programming of the computations leading to the 
matrices/vectors A, D and u. It was determined that was the next variable 
which should be included in the model. A third step of the stepwise procedure 
was also carried out, but no additional variable warranted inclusion in the 
model . 

Accordingly, in evaluating the predictive performance of the model, and 
in comparing this performance with those of discriminant analysis and multiple 
regression, we used only the explanatory variables X^ and X Q . 



3. Evaluation of the predictive performance of the model . 

3.1 Introduction 

The model developed in this paper essentially produces, for given values 
of the covariates, estimates of the cumulative category probabilities. 
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and from these estimates of P^j» the actual category probabilities. Hence, 
the model provides probability forecasts of the four storm imminence 
categories, for a given meteorological situation (as represented by the values 
of the meteorological variables). These probabilistic forecasts can be given 
directly as such, or may be converted, by methods described in Section 3.2, 
into categorical forecasts. 

The problem of evaluating statistical forecasts of this type has been, 
and continues to be, a topic of major interest in meteorology. In Section 
3,3, we consider two very simple methods of evaluating such forecasts; these 
two methods do not necessarily lead to the same conclusions in relation to the 
relative performance of the various forecasts. 

3.2 Use of the Model for Forecasting 

We consider two possibilities- 

(a) Given the estimated probability forecast, a categorical forecast can 
be provided by forecasting the category of maximal probability. We denote 
this forecast by FI. 

(b) The model described by equation (10) (or, more intuitively by 
equation (9)), suggests an underlying continuous variable, say Z, with the 
explanatory variable falling into categories 1,2,3 or 4 accordingly as Z i 0j, 

< Z < 0 2 , 0 2 < Z < 0 3 , Z > 0 3 , respectively. Given values of the 

A 

covariates, and estimates p of P % we can estimate Z by (in the notation of 
equation (10)) 
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(17) 



Z = - 



. M . 
MH e 0 

o . m nH-2 
m=l 



and then provide a categorical prediction that the storm imminence category is 
1, 2, 3 or 4 according as 



Z i 0 t 0j< Z $ 0 2 . 0 2 < Z $ 0 3 . Z > 0 3 

This forecast is denoted by F2. 



3.3 Evaluation of the forecasts 

Since the two forecasts which we are considering here are categorical 
forecasts, one plausible criterion for evaluating these forecasts would be the 
number of correct forecasts achieved. Different forecasts can be readily 
compared using this measure. Since the categories being forecast are ordered, 
an incorrect forecast which is within one category of being correct is 
presumably preferable to one which "misses” by two or more categories. Hence 
an alternative measure of performance would be the number of forecasts which 
are within one category of being correct. We use both of the above measures 
in the paper to compare forecasts. As we will show, the different criteria 
can, in some cases, lead to a different ranking of forecasts. 

In estimating the predictive performance of the model using the above 
measures, we omitted each data point in turn, estimated the model parameters 
from all the remaining data, and then used the forecast procedures FI and F2 
to predict the category of the omitted data point. For comparative purposes. 
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we also used the standard techniques of discriminant analysis and multiple 
regression (using in the latter case as dependent variable the coded imminence 
of tropical storm variable, which takes values 1, 2, 3 and 4. 

4. Discussion 

The results of the cross-validation procedure, described in Section 3.3, 
for forecast methods FI and F2. and for discriminant analysis and multiple 
regression, are given in Tables 1,2,3 and 4 respectively. An overall summary 
of the relative performance of the various methods is presented in Table 5. 

We would emphasise that any conclusions drawn here in relation to the efficacy 
of the various procedures are valid only in relation to the present 
application. Broader statements about the general performance of these 
methods would require extensive further analysis. 

It is clear from Table 5 that no single technique is clearly superior. 
Using the criterion of maximising the number of correct forecasts, the 
generalized linear model applied in this paper, with forecasting strategy F2, 
is the best among those considered. However, if maximising the numbers of 
forecasts correct within one category is chosen as the comparative criterion, 
then multiple regression emerges as the optimal methodology. 

From Table 5 it is clear that the performance of discriminant analysis 
is, in this application at least, somewhat inferior to that of the other 
techniques. A comparison of Tables 1 and 2 (which use the same model but 
different forecasting strategies based on the model) reveals some interesting 
facets about each of these strategies. Procedure FI always forecasts either 
category 1 or category 4, and produces the highest overall percentage of 
correct forecasts. Procedure F2 is less extreme, and "smears the category 1 
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forecasts over categories 1 and 2, and the category 4 forecasts over 
categories 3 and 4; this has the effect of reducing the percentage of correct 
forecasts but increasing the percentage of forecasts correct to within one 
category. Multiple regression "smears” the forecasts still further, with a 
resultant decrease in the percentage of correct forecasts and increase in the 
percentage correct to within one category. Multiple regression is, in fact, 
the method which produces the highest overall percentage of forecasts correct 
to within one category. 

It is clear that the choice of strategy (forecasting procedure) among 
those considered and described here will be greatly influenced by the relative 
importance/seriousness of the various correct/incorrect forecasts. This 
strongly suggests that a decision theoretic approach might be considered, 
although, in practice, the specification of a loss function may be difficult 
and may unduly influence the choice of strategy. 
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Table 1 

Forecast Procedure: FI 





Number of Forecasts 


True 

State 


Forecast State 


i 


2 


3 


4 


Total 


1 


27 


0 


0 


34 


61 


2 


10 


0 


0 


32 


42 


3 


4 


0 


0 


20 


24 


4 


11 


0 


0 


445 


456 


total 


52 


0 


0 


531 


583 



Table 2 

Forecast Procedure: F2 





Number of Forecasts 


True 

State 


Forecast State 


i 


2 


3 


4 


Total 


i 


18 


9 


7 


27 


61 


2 


8 


2 


5 


27 


42 


3 


1 


3 


3 


17 


24 


4 


4 


7 


17 


428 


456 


total 


31 


21 


32 


499 


583 
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Table 3 

Forecast Procedure: Discriminant Analysis 





Number of Forecasts 


True 

State 


Forecast State 


i 


2 


3 


4 


Total 


i 


27 


12 


16 


6 


61 


2 


14 


5 


9 


14 


42 


3 


3 


3 


14 


4 


24 


4 


14 


26 


62 


354 


456 


total 


58 


46 


101 


378 


583 



Table 4 

Forecast Procedure: Multiple Regression 





Number of Forecasts 


True 

State 


Forecast State 


i 


2 


3 


4 


Total 


i 


4 


21 


31 


5 


61 


2 


0 


11 


21 


10 


42 


3 


0 


2 


18 


4 


24 


4 


0 


11 


110 


335 


456 


total 


4 


405 


180 


354 


583 



16 



Table 5 



Forecast 

Procedure 


% Correct 
Forecasts 


% Forecasts 
Correct with 
one Category 


FI 


81 


86 


F2 


77 


88 


Discriminant 

Analysis 


69 


86 


Multiple 

Regression 


63 


90 



17 



5. References 



Anderson, J. A. (1984), Regression and Ordered Categorical Variables, J. R. 
Statist. Soc. , B,46, No. 1, pp. 1-30. 

Cox, D. R. (1970), Analysis of Binary Data, London^ Methuen. 

Green, P. J.. (1984), Iterated Rewighted Least Squares for Maximum Likelihood 
Estimation and Some Robust and Resistant Alternatives, J. R. Statist. 
Soc., B. 46, No. 2, pp. 149-192. 

McCullagh P. (1980), Regression Models for Ordinal Data, J. R. Statist. Soc., 
B, 42, No. 2, pp. 109-142. 

McCullagh P. and Nelder, J. A. (1983), Generalized Linear Models. London* 
Chapman and Hall. 



18 



DISTRIBUTION LIST 



Library (Code 0142) 

Naval Postgraduate School 
Monterey, CA 93943-5000 

Defense Technical Information Center 
Cameron Station 
Alexandria, VA 22314 

Office of Research Administration (Code 012) 
Naval Postgraduate School 
Monterey, CA 93943-5000 

Center for Naval Analyses 
4401 Ford Ave. 

Alexandria, VA 22302-0268 

Library (Code 55) 

Naval Postgraduate School 
Monterey, CA 93943-5000 

Operations Research Center, Rm E40-164 
Massachusetts Institute of Technology 
Attn: R. C. Larson and J. F. Shapiro 
Cambridge, MA 02139 

Koh Peng Kong 
Ministry of Defense 
Blk A, Stockport Road 
SINGAPORE 0511 



NO. OF COPIES 
2 



2 



1 



1 



1 



1 



1 



DUDLEY KNOX LIBRARY 




3 2768 00347547 6 



